Skip to content

AGT-02: Add tool registry, policy evaluator, and inbox triage template#502

Merged
Chris0Jeky merged 5 commits intomainfrom
agt/tool-registry-policy-template
Mar 29, 2026
Merged

AGT-02: Add tool registry, policy evaluator, and inbox triage template#502
Chris0Jeky merged 5 commits intomainfrom
agt/tool-registry-policy-template

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

Implements #337 — the second agent infrastructure slice, building on AGT-01's agent profile/run/event foundation.

  • Domain layer: ToolScope and ToolRiskLevel enums, ITaskdeckTool interface, ITaskdeckToolRegistry interface, PolicyDecision value object with factory methods (AllowWithReview, Deny, AllowDirect)
  • Application layer: IAgentPolicyEvaluator interface, AgentPolicyEvaluator service (allowlist enforcement, risk-level review gating, auto-apply OFF by default), TaskdeckToolRegistry (in-memory, thread-safe, case-insensitive), InboxTriageAssistant bounded template (gathers inbox items, creates proposals, never directly mutates board state)
  • Tests: 42 new tests covering tool registration/lookup/scope filtering, policy evaluation (allowlist, risk levels, disabled profiles, auto-apply opt-in), inbox triage (proposal creation, no direct mutation, policy routing, validation)

Key design decisions:

  • Review-first for ALL risk levels by default (including Low) — auto-apply only with explicit autoApplyLowRisk: true in profile policy JSON
  • Tool allowlist is opt-in: empty allowlist means all registered tools are available
  • InboxTriageAssistant routes all changes through the proposal system — zero direct board mutations
  • PolicyDecision is an immutable value object with clear factory methods

Test plan

  • dotnet build backend/Taskdeck.sln -c Release passes
  • dotnet test backend/Taskdeck.sln -c Release -m:1 passes (all existing + 42 new tests)
  • Architecture tests pass (no layer violations)
  • CI pipeline passes

…kTool, ITaskdeckToolRegistry interfaces, and PolicyDecision value object

Part of #337 (AGT-02). These domain primitives define the tool
abstraction, scope/risk classification, and policy decision contract
that the Application layer policy evaluator and tool registry build on.
Implements the Application layer for AGT-02 (#337):
- IAgentPolicyEvaluator interface for tool-use policy checks
- TaskdeckToolRegistry: in-memory, thread-safe tool registry
- AgentPolicyEvaluator: allowlist enforcement, risk-level review
  gating, auto-apply OFF by default for all risk levels
- TaskdeckToolDefinition: concrete ITaskdeckTool record
- InboxTriageAssistant: bounded template that creates proposals
  from pending inbox items, never directly mutating board state
…tant

42 tests covering:
- ToolRegistryTests: registration, duplicate rejection, lookup,
  case-insensitive keys, scope filtering
- AgentPolicyEvaluatorTests: allowlist enforcement, risk-level
  review gating (high/medium always review, low review by default),
  auto-apply opt-in, disabled profile denial, policy JSON parsing
- InboxTriageAssistantTests: proposal creation, no direct board
  mutation, policy routing, validation, edge cases
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Add missing `using` on JsonDocument.Parse to prevent resource leak.
Found during adversarial self-review.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Reviewed the full diff critically. Findings:

Fixed

  1. JsonDocument not disposed in AgentPolicyEvaluator.ParsePolicy -- JsonDocument implements IDisposable and was not wrapped in a using statement. Fixed in commit e4792c8.

Noted (acceptable)

  1. AgentPolicyConfig is internal -- This is intentional; it's a parsing implementation detail, not part of the public API. The ParsePolicy method is internal static to allow direct unit testing without exposing the config type publicly.

  2. InboxTriageAssistant uses GetByUserAsync then filters in-memory -- This loads all user LLM requests, not just pending ones. The existing ILlmQueueRepository interface does not expose a combined user+status filter. This is acceptable for now (inbox items per user are bounded) but should be addressed if item counts grow. A GetByUserAndStatusAsync method could be added in a follow-up.

  3. No DI registration -- The new services (TaskdeckToolRegistry, AgentPolicyEvaluator, InboxTriageAssistant) are not wired into the DI container in Program.cs or any startup class. This is intentional for this slice -- DI wiring will be added when the agent API endpoints are created (future issue). The services are fully unit-testable without DI.

  4. No denied_tools / blocklist support -- Policy only supports allowlists, not blocklists. This is simpler and more secure (deny-by-default when allowlist is specified). Can be extended later if needed.

No issues found in

  • Layer boundaries (Domain has no infra/app dependencies)
  • Architecture tests pass (8/8)
  • Naming consistency
  • Test coverage for edge cases (null keys, empty GUIDs, disabled profiles, malformed JSON)
  • Review-first invariant (InboxTriageAssistant never calls SaveChangesAsync directly)

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Code Review — PR #502 (AGT-02)

I read every line of the diff across all 10 new files. Here are my findings, ordered by severity.


BUG: Missing DI Registration

backend/src/Taskdeck.Api/Extensions/ApplicationServiceRegistration.cs — None of the three new services are registered in the DI container:

  • ITaskdeckToolRegistry / TaskdeckToolRegistry (should be Singleton — it's an in-memory registry populated at startup)
  • IAgentPolicyEvaluator / AgentPolicyEvaluator (Scoped, since it depends on IUnitOfWork)
  • InboxTriageAssistant (Scoped)

Without these registrations, nothing in this PR can be resolved at runtime. Any controller or service depending on these will fail with a System.InvalidOperationException at DI resolution time.

BUG: No Tool Registration at Startup

InboxTriageAssistant.GetToolDefinition() exists but is never called during startup. There is no code that does:

registry.RegisterTool(InboxTriageAssistant.GetToolDefinition());

Without this, _toolRegistry.GetTool("inbox.triage") in the policy evaluator will always return null, and every triage request will be denied with "Tool 'inbox.triage' is not registered."

Both this and the DI registration above need startup wiring — likely in ApplicationServiceRegistration or Program.cs.


DESIGN: PolicyDecision Equality Semantics

backend/src/Taskdeck.Domain/Agents/PolicyDecision.csPolicyDecision overrides Equals(object?) and GetHashCode() but:

  • Does not implement IEquatable<PolicyDecision>
  • Does not override operator == / operator !=

This means decision1 == decision2 uses reference equality, but decision1.Equals(decision2) uses value equality. This is a classic C# gotcha. If this is intended as a value object (the XML doc says so), it should either:

  1. Be a record (which gets all of this for free), or
  2. Implement IEquatable<PolicyDecision> and override ==/!=

Recommendation: make it public sealed record PolicyDecision with a private constructor and keep the factory methods. Records give you immutability, value equality, ==/!=, ToString(), and deconstruction for free.

DESIGN: ITaskdeckToolRegistry in Domain Layer

backend/src/Taskdeck.Domain/Agents/ITaskdeckToolRegistry.cs — This interface contains void RegisterTool(ITaskdeckTool tool) which is a mutation/service operation. The Domain layer should be free of infrastructure service abstractions per CLAUDE.md ("Core entities and business rules. No infrastructure dependencies — keep it pure.").

Consider: Keep ITaskdeckTool and enums in Domain (they describe business concepts). Move ITaskdeckToolRegistry to Taskdeck.Application.Interfaces — the Application layer is where service interfaces belong. The evaluator already lives in Application and can reference it there.

This doesn't fail architecture tests today because the Domain files don't import forbidden namespaces, but it loosens the Domain purity principle.


MINOR: AgentPolicyConfig Mutability

backend/src/Taskdeck.Application/Services/AgentPolicyEvaluator.cs:168-173AgentPolicyConfig.AllowedTools is IReadOnlyList<string> but backed by new List<string>(). A caller could cast and mutate: ((List<string>)config.AllowedTools).Add("malicious.tool"). The default uses Array.Empty<string>() which is truly immutable. For consistency and defense-in-depth, wrap with .AsReadOnly() or use allowedTools.ToArray() in ParsePolicy.

Since this is internal, the practical risk is low.

MINOR: No Test for TruncateTitle Edge Cases

InboxTriageAssistant.TruncateTitle handles multi-line input, empty lines, and length truncation, but there are no direct tests for:

  • Empty string payload
  • Payload with only whitespace/newlines
  • Payload exceeding 200 characters on the first line
  • Payload that is exactly 200 characters

These are covered indirectly through integration tests but the method deserves its own unit test (it's private static so would need to be made internal with InternalsVisibleTo, or tested through the public API with crafted payloads).

MINOR: Deny Decision Semantics

PolicyDecision.Deny creates new(false, false, reason) — meaning RequiresReview = false on a denied action. Semantically, a denied action doesn't "not require review" — the concept is N/A. This isn't a functional bug (callers check Allowed first), but it could confuse future readers. Consider adding a comment or using a nullable bool? for RequiresReview.


What Looks Good

  • Thread safety: ConcurrentDictionary with StringComparer.OrdinalIgnoreCase in the registry is correct. TryAdd is atomic. Registration-time validation is clean.
  • Policy evaluation logic: The cascade (empty ID → unregistered tool → missing profile → disabled profile → allowlist → risk level) is thorough and correctly ordered. High/Medium risk always require review regardless of autoApplyLowRisk.
  • Review-first default: Low-risk tools default to review-first, auto-apply only with explicit opt-in. This aligns with the project's golden principle.
  • No direct board mutation: InboxTriageAssistant never calls SaveChangesAsync — all changes go through proposals. The test for this (RunTriage_ShouldNeverDirectlyMutateBoard) is well-designed.
  • JSON parsing resilience: ParsePolicy handles null, empty, "{}", malformed JSON, and missing properties gracefully, always returning safe defaults.
  • Test coverage: 42 tests covering all major paths. Mock setup is realistic. Edge cases for policy parsing are good.
  • Naming and conventions: PascalCase for publics, camelCase for locals, proper C# idioms throughout. Clean XML documentation.
  • Architecture purity: Application layer does not import Infrastructure or API. Domain files have no forbidden imports.

Verdict

The two DI/startup registration issues are blocking — without them, the entire feature is dead code that will fail at runtime. The PolicyDecision equality issue is a significant design concern that should be addressed before or shortly after merge to avoid future bugs when someone inevitably writes if (decision == expected).

The remaining items are minor and can be addressed in follow-up.

The tool registry, policy evaluator, and inbox triage assistant were
missing DI registrations, making the entire feature unresolvable at
runtime. Also registers the inbox.triage tool definition at startup
so the policy evaluator can find it.
@Chris0Jeky Chris0Jeky merged commit 0f38f82 into main Mar 29, 2026
18 checks passed
@Chris0Jeky Chris0Jeky deleted the agt/tool-registry-policy-template branch March 29, 2026 03:46
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant